Exploiting foreign resources for DNN-based ASR

نویسندگان

  • Petr Motlícek
  • David Imseng
  • Blaise Potard
  • Philip N. Garner
  • Ivan Himawan
چکیده

Manual transcription of audio databases for the development of automatic speech recognition (ASR) systems is a costly and time-consuming process. In the context of deriving acoustic models adapted to a specific application, or in low-resource scenarios, it is therefore essential to explore alternatives capable of improving speech recognition results. In this paper, we investigate the relevance of foreign data characteristics, in particular domain and language, when using this data as an auxiliary data source for training ASR acoustic models based on deep neural networks (DNNs). The acoustic models are evaluated on a challenging bilingual database within the scope of the MediaParl project. Experimental results suggest that in-language (but out-of-domain) data is more beneficial than in-domain (but out-of-language) data when employed in either supervised or semi-supervised training of DNNs. The best performing ASR system, an HMM/GMM acoustic model that exploits DNN as a discriminatively trained feature extractor outperforms the best performing HMM/DNN hybrid by about 5% relative (in terms of WER). An accumulated relative gain with respect to the MFCC-HMM/GMM baseline is about 30% WER.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

Different training and adaptation techniques for multilingual Automatic Speech Recognition (ASR) are explored in the context of hybrid systems, exploiting Deep Neural Networks (DNN) and Hidden Markov Models (HMM). In multilingual DNN training, the hidden layers (possibly extracting bottleneck features) are usually shared across languages, and the output layer can either model multiple sets of l...

متن کامل

Integration of DNN based speech enhancement and ASR

Speech enhancement employing Deep Neural Networks (DNNs) is gaining strength as a data-driven alternative to classical Minimum Mean Square Error (MMSE) enhancement approaches. In the past, Observation Uncertainty approaches to integrate MMSE speech enhancement with Automatic Speech Recognition (ASR) have yielded good results as a lightweight alternative for robust ASR. In this paper we thus exp...

متن کامل

Low-rank Representation for Enhanced Deep Neural Network Acoustic Models

Automatic speech recognition (ASR) is a fascinating area of research towards realizing humanmachine interactions. After more than 30 years of exploitation of Gaussian Mixture Models (GMMs), state-of-the-art systems currently rely on Deep Neural Network (DNN) to estimate class-conditional posterior probabilities. The posterior probabilities are used for acoustic modeling in hidden Markov models ...

متن کامل

Linear Prediction-based Dereverberation with Advanced Speech Enhancement and Recognition Technologies for the Reverb Challenge

This paper describes systems for the enhancement and recognition of distant speech recorded in reverberant rooms. Our speech enhancement (SE) system handles reverberation with blind deconvolution using linear filtering estimated by exploiting the temporal correlation of observed reverberant speech signals. Additional noise reduction is then performed using an MVDR beamformer and advanced model-...

متن کامل

Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech

Research on automatic speech recognition (ASR) of pathological speech is particularly hindered by scarce in-domain data resources. Collecting representative pathological speech data is difficult due to the large variability caused by the nature and severity of the disorders, and the rigorous ethical and medical permission requirements. This task becomes even more challenging for languages which...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Audio, Speech and Music Processing

دوره 2015  شماره 

صفحات  -

تاریخ انتشار 2015